US Covid-19 Cases_States and Counties Based Visualization

Name: Jingcheng Jiang

I used two datasets related to US Covid-19 cases statistics, which are:

  1. Covid-19 Cases based on states' statistics.

Source: 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv' from New York Times.

  1. Covid-19 Cases based on counties' statistics.

Source: 'https://query.data.world/s/sc4gq2roysjsytksfhvhkoybk5xm2j' from Johns Hopkins.

In [1]:
# Import modules
import pandas as pd
import plotly.express as px
from urllib.request import urlopen
import json
In [2]:
# Read in datasets
states_url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv'
states_df = pd.read_csv(states_url)
counties_df = pd.read_csv('https://query.data.world/s/sc4gq2roysjsytksfhvhkoybk5xm2j')
counties_df = counties_df.dropna(how='any',axis=0)

In order to identify the location in the following plot, we need to add an attribute - the abbreviation of each state to the dataset.

In [3]:
# Use the state code file for reference and use for loop to add the attribute of state_code
statenames = pd.read_csv("state_code.csv")
In [4]:
states_df['state_code'] = states_df['state']
for i in range(len(statenames)):
    states_df['state_code'].replace(statenames.loc[i]['State'],
        statenames.loc[i]['Code'],inplace = True)

Take a look at the two datasets used.

In [5]:
states_df
Out[5]:
date state fips cases deaths state_code
0 2020-01-21 Washington 53 1 0 WA
1 2020-01-22 Washington 53 1 0 WA
2 2020-01-23 Washington 53 1 0 WA
3 2020-01-24 Illinois 17 1 0 IL
4 2020-01-24 Washington 53 1 0 WA
5 2020-01-25 California 6 1 0 CA
6 2020-01-25 Illinois 17 1 0 IL
7 2020-01-25 Washington 53 1 0 WA
8 2020-01-26 Arizona 4 1 0 AZ
9 2020-01-26 California 6 2 0 CA
10 2020-01-26 Illinois 17 1 0 IL
11 2020-01-26 Washington 53 1 0 WA
12 2020-01-27 Arizona 4 1 0 AZ
13 2020-01-27 California 6 2 0 CA
14 2020-01-27 Illinois 17 1 0 IL
15 2020-01-27 Washington 53 1 0 WA
16 2020-01-28 Arizona 4 1 0 AZ
17 2020-01-28 California 6 2 0 CA
18 2020-01-28 Illinois 17 1 0 IL
19 2020-01-28 Washington 53 1 0 WA
20 2020-01-29 Arizona 4 1 0 AZ
21 2020-01-29 California 6 2 0 CA
22 2020-01-29 Illinois 17 1 0 IL
23 2020-01-29 Washington 53 1 0 WA
24 2020-01-30 Arizona 4 1 0 AZ
25 2020-01-30 California 6 2 0 CA
26 2020-01-30 Illinois 17 2 0 IL
27 2020-01-30 Washington 53 1 0 WA
28 2020-01-31 Arizona 4 1 0 AZ
29 2020-01-31 California 6 3 0 CA
... ... ... ... ... ... ...
2954 2020-04-25 Mississippi 28 5718 221 MS
2955 2020-04-25 Missouri 29 6826 280 MO
2956 2020-04-25 Montana 30 445 14 MT
2957 2020-04-25 Nebraska 31 2902 53 NE
2958 2020-04-25 Nevada 32 4539 206 NV
2959 2020-04-25 New Hampshire 33 1793 60 NH
2960 2020-04-25 New Jersey 34 105523 5863 NJ
2961 2020-04-25 New Mexico 35 2662 93 NM
2962 2020-04-25 New York 36 282174 16599 NY
2963 2020-04-25 North Carolina 37 8623 300 NC
2964 2020-04-25 North Dakota 38 803 16 ND
2965 2020-04-25 Northern Mariana Islands 69 14 2 Northern Mariana Islands
2966 2020-04-25 Ohio 39 15587 711 OH
2967 2020-04-25 Oklahoma 40 3193 194 OK
2968 2020-04-25 Oregon 41 2253 87 OR
2969 2020-04-25 Pennsylvania 42 41626 1842 PA
2970 2020-04-25 Puerto Rico 72 1307 52 Puerto Rico
2971 2020-04-25 Rhode Island 44 7129 215 RI
2972 2020-04-25 South Carolina 45 5253 166 SC
2973 2020-04-25 South Dakota 46 2147 10 SD
2974 2020-04-25 Tennessee 47 8977 187 TN
2975 2020-04-25 Texas 48 24494 662 TX
2976 2020-04-25 Utah 49 3950 41 UT
2977 2020-04-25 Vermont 50 843 46 VT
2978 2020-04-25 Virgin Islands 78 55 3 Virgin Islands
2979 2020-04-25 Virginia 51 12366 436 VA
2980 2020-04-25 Washington 53 13484 743 WA
2981 2020-04-25 West Virginia 54 1025 33 WV
2982 2020-04-25 Wisconsin 55 5687 266 WI
2983 2020-04-25 Wyoming 56 362 7 WY

2984 rows × 6 columns

In [6]:
counties_df
Out[6]:
last_update state county_name county_name_long fips_code lat lon NCHS_urbanization total_population confirmed confirmed_per_100000 deaths deaths_per_100000
0 2020-04-26 18:30:52 Alabama Autauga Autauga, Alabama, US 1001.0 32.539527 -86.644082 Medium metro 55200.0 37 67.03 2 3.62
1 2020-04-26 18:30:52 Alabama Baldwin Baldwin, Alabama, US 1003.0 30.727750 -87.722071 Small metro 208107.0 154 74.00 3 1.44
2 2020-04-26 18:30:52 Alabama Barbour Barbour, Alabama, US 1005.0 31.868263 -85.387129 Non-core 25782.0 33 128.00 0 0.00
3 2020-04-26 18:30:52 Alabama Bibb Bibb, Alabama, US 1007.0 32.996421 -87.125115 Large fringe metro 22527.0 35 155.37 0 0.00
4 2020-04-26 18:30:52 Alabama Blount Blount, Alabama, US 1009.0 33.982109 -86.567906 Large fringe metro 57645.0 31 53.78 0 0.00
5 2020-04-26 18:30:52 Alabama Bullock Bullock, Alabama, US 1011.0 32.100305 -85.712655 Non-core 10352.0 12 115.92 0 0.00
6 2020-04-26 18:30:52 Alabama Butler Butler, Alabama, US 1013.0 31.753001 -86.680575 Non-core 20025.0 28 139.83 0 0.00
7 2020-04-26 18:30:52 Alabama Calhoun Calhoun, Alabama, US 1015.0 33.774837 -85.826304 Small metro 115098.0 90 78.19 3 2.61
8 2020-04-26 18:30:52 Alabama Chambers Chambers, Alabama, US 1017.0 32.913601 -85.390727 Micropolitan 33826.0 284 839.59 18 53.21
9 2020-04-26 18:30:52 Alabama Cherokee Cherokee, Alabama, US 1019.0 34.178060 -85.606390 Non-core 25853.0 12 46.42 0 0.00
10 2020-04-26 18:30:52 Alabama Chilton Chilton, Alabama, US 1021.0 32.850441 -86.717326 Large fringe metro 43930.0 49 111.54 1 2.28
11 2020-04-26 18:30:52 Alabama Choctaw Choctaw, Alabama, US 1023.0 32.022273 -88.265644 Non-core 13075.0 27 206.50 0 0.00
12 2020-04-26 18:30:52 Alabama Clarke Clarke, Alabama, US 1025.0 31.680999 -87.835486 Non-core 24387.0 25 102.51 1 4.10
13 2020-04-26 18:30:52 Alabama Clay Clay, Alabama, US 1027.0 33.269842 -85.858361 Non-core 13378.0 19 142.02 0 0.00
14 2020-04-26 18:30:52 Alabama Cleburne Cleburne, Alabama, US 1029.0 33.676792 -85.520059 Non-core 14938.0 12 80.33 1 6.69
15 2020-04-26 18:30:52 Alabama Coffee Coffee, Alabama, US 1031.0 31.399328 -85.989010 Micropolitan 51288.0 88 171.58 1 1.95
16 2020-04-26 18:30:52 Alabama Colbert Colbert, Alabama, US 1033.0 34.698475 -87.801685 Small metro 54495.0 23 42.21 2 3.67
17 2020-04-26 18:30:52 Alabama Conecuh Conecuh, Alabama, US 1035.0 31.434017 -86.993200 Non-core 12514.0 9 71.92 0 0.00
18 2020-04-26 18:30:52 Alabama Coosa Coosa, Alabama, US 1037.0 32.936901 -86.248477 Micropolitan 10855.0 29 267.16 1 9.21
19 2020-04-26 18:30:52 Alabama Covington Covington, Alabama, US 1039.0 31.247785 -86.450509 Non-core 37351.0 32 85.67 0 0.00
20 2020-04-26 18:30:52 Alabama Crenshaw Crenshaw, Alabama, US 1041.0 31.729418 -86.315931 Non-core 13865.0 10 72.12 0 0.00
21 2020-04-26 18:30:52 Alabama Cullman Cullman, Alabama, US 1043.0 34.130203 -86.868880 Micropolitan 82313.0 49 59.53 0 0.00
22 2020-04-26 18:30:52 Alabama Dale Dale, Alabama, US 1045.0 31.430371 -85.610957 Micropolitan 49255.0 25 50.76 0 0.00
23 2020-04-26 18:30:52 Alabama Dallas Dallas, Alabama, US 1047.0 32.326881 -87.108667 Micropolitan 40029.0 32 79.94 2 5.00
24 2020-04-26 18:30:52 Alabama DeKalb DeKalb, Alabama, US 1049.0 34.459469 -85.807829 Non-core 71200.0 63 88.48 2 2.81
25 2020-04-26 18:30:52 Alabama Elmore Elmore, Alabama, US 1051.0 32.597854 -86.144153 Medium metro 81212.0 77 94.81 1 1.23
26 2020-04-26 18:30:52 Alabama Escambia Escambia, Alabama, US 1053.0 31.125679 -87.159187 Non-core 37328.0 22 58.94 1 2.68
27 2020-04-26 18:30:52 Alabama Etowah Etowah, Alabama, US 1055.0 34.045673 -86.040519 Small metro 102939.0 123 119.49 8 7.77
28 2020-04-26 18:30:52 Alabama Fayette Fayette, Alabama, US 1057.0 33.720769 -87.738866 Non-core 16585.0 5 30.15 0 0.00
29 2020-04-26 18:30:52 Alabama Franklin Franklin, Alabama, US 1059.0 34.442353 -87.842895 Non-core 31542.0 38 120.47 0 0.00
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2776 2020-04-26 18:30:52 Wisconsin Walworth Walworth, Wisconsin, US 55127.0 42.668582 -88.541631 Micropolitan 103013.0 132 128.14 8 7.77
2777 2020-04-26 18:30:52 Wisconsin Washburn Washburn, Wisconsin, US 55129.0 45.898386 -91.790504 Non-core 15689.0 1 6.37 0 0.00
2778 2020-04-26 18:30:52 Wisconsin Washington Washington, Wisconsin, US 55131.0 43.368637 -88.229747 Large fringe metro 134535.0 92 68.38 4 2.97
2779 2020-04-26 18:30:52 Wisconsin Waukesha Waukesha, Wisconsin, US 55133.0 43.018331 -88.304312 Large fringe metro 398879.0 299 74.96 14 3.51
2780 2020-04-26 18:30:52 Wisconsin Waupaca Waupaca, Wisconsin, US 55135.0 44.470681 -88.965345 Non-core 51444.0 7 13.61 1 1.94
2781 2020-04-26 18:30:52 Wisconsin Waushara Waushara, Wisconsin, US 55137.0 44.113244 -89.243171 Non-core 24116.0 2 8.29 0 0.00
2782 2020-04-26 18:30:52 Wisconsin Winnebago Winnebago, Wisconsin, US 55139.0 44.068869 -88.644771 Small metro 169926.0 48 28.25 1 0.59
2783 2020-04-26 18:30:52 Wisconsin Wood Wood, Wisconsin, US 55141.0 44.455379 -90.041583 Micropolitan 73274.0 2 2.73 0 0.00
2784 2020-04-26 18:30:52 Wyoming Albany Albany, Wyoming, US 56001.0 41.654987 -105.723542 Micropolitan 38102.0 6 15.75 0 0.00
2785 2020-04-26 18:30:52 Wyoming Big Horn Big Horn, Wyoming, US 56003.0 44.524051 -107.996037 Non-core 11901.0 2 16.81 0 0.00
2786 2020-04-26 18:30:52 Wyoming Campbell Campbell, Wyoming, US 56005.0 44.248861 -105.547440 Micropolitan 47708.0 23 48.21 0 0.00
2787 2020-04-26 18:30:52 Wyoming Carbon Carbon, Wyoming, US 56007.0 41.693578 -106.932608 Non-core 15477.0 4 25.84 0 0.00
2788 2020-04-26 18:30:52 Wyoming Converse Converse, Wyoming, US 56009.0 42.972723 -105.508185 Non-core 13997.0 16 114.31 0 0.00
2789 2020-04-26 18:30:52 Wyoming Crook Crook, Wyoming, US 56011.0 44.588551 -104.569770 Non-core 7410.0 5 67.48 0 0.00
2790 2020-04-26 18:30:52 Wyoming Fremont Fremont, Wyoming, US 56013.0 43.041840 -108.629689 Micropolitan 40076.0 83 207.11 0 0.00
2791 2020-04-26 18:30:52 Wyoming Goshen Goshen, Wyoming, US 56015.0 42.087982 -104.353474 Non-core 13438.0 4 29.77 0 0.00
2792 2020-04-26 18:30:52 Wyoming Hot Springs Hot Springs, Wyoming, US 56017.0 43.719307 -108.442317 Non-core 4680.0 3 64.10 0 0.00
2793 2020-04-26 18:30:52 Wyoming Johnson Johnson, Wyoming, US 56019.0 44.040572 -106.584517 Non-core 8515.0 15 176.16 1 11.74
2794 2020-04-26 18:30:52 Wyoming Laramie Laramie, Wyoming, US 56021.0 41.307025 -104.688750 Small metro 97692.0 124 126.93 0 0.00
2795 2020-04-26 18:30:52 Wyoming Lincoln Lincoln, Wyoming, US 56023.0 42.263764 -110.656400 Non-core 19011.0 9 47.34 0 0.00
2796 2020-04-26 18:30:52 Wyoming Natrona Natrona, Wyoming, US 56025.0 42.961801 -106.797885 Small metro 80610.0 49 60.79 0 0.00
2797 2020-04-26 18:30:52 Wyoming Niobrara Niobrara, Wyoming, US 56027.0 43.056077 -104.475890 Non-core 2448.0 2 81.70 0 0.00
2798 2020-04-26 18:30:52 Wyoming Park Park, Wyoming, US 56029.0 44.521575 -109.585282 Non-core 29121.0 1 3.43 0 0.00
2799 2020-04-26 18:30:52 Wyoming Platte Platte, Wyoming, US 56031.0 42.132991 -104.966331 Non-core 8673.0 0 0.00 0 0.00
2800 2020-04-26 18:30:52 Wyoming Sheridan Sheridan, Wyoming, US 56033.0 44.790489 -106.886239 Micropolitan 30012.0 16 53.31 0 0.00
2801 2020-04-26 18:30:52 Wyoming Sublette Sublette, Wyoming, US 56035.0 42.765583 -109.913092 Non-core 9951.0 3 30.15 0 0.00
2802 2020-04-26 18:30:52 Wyoming Sweetwater Sweetwater, Wyoming, US 56037.0 41.659439 -108.882788 Micropolitan 44117.0 16 36.27 0 0.00
2803 2020-04-26 18:30:52 Wyoming Teton Teton, Wyoming, US 56039.0 43.935225 -110.589080 Micropolitan 23059.0 95 411.99 0 0.00
2804 2020-04-26 18:30:52 Wyoming Uinta Uinta, Wyoming, US 56041.0 41.287818 -110.547578 Micropolitan 20609.0 7 33.97 0 0.00
2805 2020-04-26 18:30:52 Wyoming Washakie Washakie, Wyoming, US 56043.0 43.904516 -107.680187 Non-core 8129.0 8 98.41 0 0.00

2806 rows × 13 columns

Since the State-based dataset is gathered by date, I decide to make a scatter mapplot with interactive timeline.

By dragging the timelineine in the visualization, users can clearly see the growth of cases in each State, from January 21 to April 24, 2020.

In addition, by pointing the mouse on each State on the mappolt, users can see tags including state name, number of confirmed cases, number of deaths, etc.

The brighter the color, the larger the area of bubbles, indicating more cases confirmed in the State.

In [7]:
fig = px.scatter_geo(states_df, locations = 'state_code',
                    locationmode = 'USA-states', color = 'cases',
                    color_continuous_scale = px.colors.sequential.Agsunset,
                    hover_name = 'state', size = 'cases',size_max = 80,
                    hover_data = ['deaths'], scope = 'usa',
                    title = 'USA Covid-19 Cases_States Based',
                    animation_frame = 'date')
fig.show()

In the following part, we are going to explore the real-time data for each county.

In order to locate each county in the dataset on the US map, we need to use fips code which is the unique identification for each county in US.

In [8]:
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

After read in the fips code, we can make our second visualization of county based Covid-19 cases.

Although some county data is temporarily missing, we can see the epidemic situation of each county by the shade of the color.

The darker the color, the more cases confirmed. Similarly, when the user puts the mouse over each county, they can see the label containing the name of the County-State, the number of confirmed cases, and the number of deaths.

In [9]:
fig = px.choropleth_mapbox(counties_df,geojson=counties,locations='fips_code', color='confirmed',
                           color_continuous_scale=px.colors.sequential.Blues,
                           range_color=[0, 200],
                           mapbox_style="carto-positron",
                           zoom=3, 
                           center = {"lat": 37, "lon": -95},
                           hover_name = 'county_name_long',
                           hover_data = ['deaths'],
                           title = 'USA Covid-19 Cases_County Based')

fig.update_layout(coloraxis_colorbar = 
                  dict(tickvals=[0,50,100,150,200],
                      ticktext = ['0','50','100','150','>200']))
fig.show()

The following visualization shows the Confirmed Cases and Deaths Cases in Each County

In the scatter plot, users can check the number of confirmed cases, deaths cases and urbanization degree of each county by pointing the mouse on each county.

From this visualization, we can find that the counties with large numbers of confirmed cases and deaths cases are either tagged as "Large Central Metro" or "Large Fringe Metro".

Note: To make the visualization clearer, I exclude the counties which have more than 40000 confirmed cases.

In [10]:
new_counties_df = counties_df[counties_df["confirmed"]<40000]
In [11]:
fig = px.scatter(new_counties_df, x="confirmed", y="deaths",
                 hover_name = 'county_name_long',
                 hover_data = ['NCHS_urbanization'],
                title = "Confirmed Cases and Deaths Cases in each Counties")
fig.show()

Finally, I made a bar chart indicating the "Total Confirmed Cases of each Urbanization Degree of US Counties".

In [12]:
# Get the grouped data
total_confirmed = counties_df.groupby("NCHS_urbanization")["confirmed"].sum().reset_index()
total_confirmed
Out[12]:
NCHS_urbanization confirmed
0 Large central metro 402639
1 Large fringe metro 345408
2 Medium metro 108097
3 Micropolitan 31739
4 Non-core 17859
5 Small metro 40844

From the bar chart below, we can find that the relationship between the number of total confirmed cases and the degree of urbanization. With higher urbanization degree, the worse the epidemic situation seems.

In [13]:
fig = px.bar(total_confirmed, 
             x='NCHS_urbanization', 
             y='confirmed',
             color = 'confirmed',
            title = "Total Confirmed Cases of each Urbanization Degree of US Counties")
fig.show()

Main Findings and Conclusion:

  1. The U.S. epidemic expanded rapidly in March 2020, East and West Coast states have more severe epidemics situations. As of April 24, 2020, the severity of epidemic outbreaks in New York and New Jersey ranked first and second places.

  2. Counties with large numbers of confirmed cases and deaths cases are labeled as either "Large Central Metro" or "Large Fringe Metro".

  3. The total number of confirmed cases in counties labeled as "Large Central Metro" and "Large Fringe Metro" is much higher than in counties with other urbanization levels, the number of confirmed cases of these counties accounts for 80% of the total in the US.

Thanks for reading! If you have any questions, please feel free to contact me by email at jj21@illinois.edu.

In [ ]: